Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm
نویسندگان
چکیده
MOTIVATION Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. RESULTS We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. AVAILABILITY PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
منابع مشابه
Co-estimation of Phylogeny-aware Alignment and Phylogenetic Tree
The phylogeny-aware alignment algorithm implemented in both PRANK and PAGAN has been found to produce highly accurate alignments for comparative sequence analysis. However, the algorithm’s reliance on a guide tree during the alignment process can bias the resulting alignment rendering it unsuitable for phylogenetic inference. To overcome these issues, we have developed a new tool, Canopy, for p...
متن کاملEffects of Gap Open and Gap Extension Penalties
Fundamental to multiple sequence alignment algorithms is modeling insertions and deletions (gaps). The most prevalent model is to use gap open and gap extension penalties. While gap open and gap extension penalties are well understood conceptually, their effects on multiple sequence alignment, and consequently on phylogeny scores are not as well understood. We use exhaustive phylogeny searching...
متن کاملEfficient Construction of accurate Multiple alignments and Large-Scale phylogenies
A central focus of computational biology is to organize and make use of vast stores of molecular sequence data. Two of the most studied and fundamental problems in the field are sequence alignment and phylogeny inference. The problem of multiple sequence alignment is to take a set of DNA, RNA, or protein sequences and identify related segments of these sequences. Perhaps the most common use of ...
متن کاملDendroBLAST: Approximate Phylogenetic Trees in the Absence of Multiple Sequence Alignments
The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of in...
متن کاملIntegration of Alignment and Phylogeny in the Whole-Genome Era
OF THE DISSERTATION Integration of Alignment and Phylogeny in the Whole-Genome Era by Hongtao Sun Doctor of Philosophy in Computer Science Washington University in St. Louis, 2015 Professor Jeremy Buhler, Chair With the development of new sequencing techniques, whole genomes of many species have become available. This huge amount of data gives rise to new opportunities and challenges. These new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 28 شماره
صفحات -
تاریخ انتشار 2012